Direct integration of microarrays for selecting informative genes and phenotype classification

نویسندگان

  • Youngmi Yoon
  • Jongchan Lee
  • Sanghyun Park
  • Sangjay Bien
  • Hyun Cheol Chung
  • Sun Young Rha
چکیده

The ability to provide thousands of gene expression values simultaneously makes microarray data very useful for phenotype classification. A major constraint in phenotype classification is that the number of genes greatly exceeds the number of samples. We overcame this constraint in two ways; we increased the number of samples by integrating independently generated microarrays that had been designed with the same biological objectives, and reduced the number of genes involved in the classification by selecting a small set of informative genes. We were able to maximally use the abundant microarray data that is being stockpiled by thousands of different research groups while improving classification accuracy. Our goal is to implement a feature (gene) selection method that can be applicable to integrated microarrays as well as to build a highly accurate classifier that permits straightforward biological interpretation. In this paper, we propose a twostage approach. Firstly, we performed a direct integration of individual microarrays by transforming an expression value into a rank value within a sample and identified informative genes by calculating the number of swaps to reach a perfectly split sequence. Secondly, we built a classifier which is a parameter-free ensemble method using only the pre-selected informative genes. By using our classifier that was derived from large, integrated microarray sample datasets, we achieved high accuracy, sensitivity, and specificity in the classification of an independent test dataset. 2007 Elsevier Inc. All rights reserved.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Identification of Alzheimer disease-relevant genes using a novel hybrid method

Identifying genes underlying complex diseases/traits that generally involve multiple etiological mechanisms and contributing genes is difficult. Although microarray technology has enabled researchers to investigate gene expression changes, but identifying pathobiologically relevant genes remains a challenge. To address this challenge, we apply a new method for selecting the disease-relevant gen...

متن کامل

Diagnosis of the disease using an ant colony gene selection method based on information gain ratio using fuzzy rough sets

With the advancement of metagenome data mining science has become focused on microarrays. Microarrays are datasets with a large number of genes that are usually irrelevant to the output class; hence, the process of gene selection or feature selection is essential. So, it follows that you can remove redundant genes and increase the speed and accuracy of classification. After applying the gene se...

متن کامل

Selecting a Small Subset of Informative Genes from Gene Expression Data by Using a Modified Binary Particle Swarm Optimisation

Gene expression technology, especially microarrays, can be used to measure the expression levels of thousands of genes simultaneously in biological organisms. Gene expression data produced by microarrays are expected to be useful for cancer classification. To select a small subset of informative genes for cancer classification, many researchers have analysed the gene expression data using vario...

متن کامل

Improved Gene Selection for Classification of Microarrays

In this paper we derive a method for evaluating and improving techniques for selecting informative genes from microarray data. Genes of interest are typically selected by ranking genes according to a test-statistic and then choosing the top k genes. A problem with this approach is that many of these genes are highly correlated. For classification purposes it would be ideal to have distinct but ...

متن کامل

MOLECULAR STUDY OF PKD1 & PKD2 GENES BY LINKAGE ANALYSIS AND DETERMINING THE GENOTYPE/PHENOTYPE CORRELATIONS IN SEVERAL IRANIAN FAMILIES WITH AUTOSOMAL DOMINANT POLYCYSTIC KIDNEY DISEASE

 ABSTRACT Background: Autosomal dominant polycystic kidney disease (ADPKD) is an inherited disorder with genetic heterogeneity. Up to three loci are involved in this disease, PKDI on chromosome 16p13.3, PKD2 on 4q21, and a third locus of unknown location. Methods: Here we report the first molecular genetic study of ADPKD and the existence oflocus heterogeneity for ADPKD in the Iranian populatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Inf. Sci.

دوره 178  شماره 

صفحات  -

تاریخ انتشار 2008